This project analyzes the stock markets of major European semiconductor companies. The goal of the project is to retrive financial data from yfinance and use it to forecast stock markets of the companies with time series analysis and machine learning. The results can be applied to trading and finacial decisionmaking. Note that this project itself does not provide such decisionmaking. It serves only as a general analysis and guideline.
Initially this project also featured an attempt at forecasting stock close prices with a hybrid LSTM-ARIMA model (inspired by this paper), but after many failed attempts, it was scrapped. The original paper doesn’t use the model for time series forecast, but instead for trend and buy/sell signal detection. There are many other examples of LSTM being used for stock data forecast, but for a very volatile market, it might not be the best fit.
Data cleansing
Code
import yfinance as yfimport randomimport matplotlib.pyplot as pltimport matplotlib.cm as cmimport seaborn as snsimport pandas as pdimport numpy as npimport plotly.graph_objects as goimport plotly.express as pximport timefrom datetime import datetime, timedeltatickers = ["ASML.AS", "NXPI", "IFX.DE", "BESI.AS","NOD.OL", "MELE.BR", "AIXA.DE", "SMHN.DE", "AWEVF"]all_data = {}yesterday = datetime.today() - timedelta(days=1)yesterday_str = yesterday.strftime('%Y-%m-%d')#Fetch data in relative time to get reliable resultsfor ticker in tickers:for attempt inrange(3):try: stock = yf.Ticker(ticker) hist = stock.history(period="max", end=yesterday_str)if hist isNoneor hist.empty: display(f"No data for {ticker}, attempt {attempt+1}") time.sleep(2)continue all_data[ticker] = histbreakexceptExceptionas e: display(f"Error fetching {ticker}: {e}, attempt {attempt+1}") time.sleep(2)#Check out ASML data as a testif"ASML.AS"in all_data: display("ASML stocks tail") display(all_data["ASML.AS"].tail())else: display("ASML.AS data not available")#Clean and processed data for continuous time seriesprocessed_data = {}for ticker, df in all_data.items():if df.empty:continue df.index = df.index.tz_localize(None) df_continuous = df.asfreq('D') cols_to_ffill = ['Open', 'High', 'Low', 'Close', 'Adj Close'] existing_cols = [c for c in cols_to_ffill if c in df_continuous.columns] df_continuous[existing_cols] = df_continuous[existing_cols].ffill()if'Volume'in df_continuous.columns: df_continuous['Volume'] = df_continuous['Volume'].fillna(0) processed_data[ticker] = df_continuous
'ASML stocks tail'
Open
High
Low
Close
Volume
Dividends
Stock Splits
Date
2026-02-09 00:00:00+01:00
1200.000000
1205.000000
1177.400024
1204.800049
456425
1.6
0.0
2026-02-10 00:00:00+01:00
1196.000000
1212.400024
1185.800049
1193.000000
459476
0.0
0.0
2026-02-11 00:00:00+01:00
1185.400024
1224.000000
1176.599976
1207.800049
530632
0.0
0.0
2026-02-12 00:00:00+01:00
1225.000000
1225.000000
1176.599976
1179.800049
558696
0.0
0.0
2026-02-13 00:00:00+01:00
1190.599976
1210.599976
1173.800049
1190.400024
708101
0.0
0.0
Line chart plot
After cleaning and processing the data, the next step is to visualize the stock markets in a clean line chart. Plotly offers some of the cleanest and most interactive visualization for this. There are downsides for using plotly however, the main ones being memory-heaviness and slowness. That is why it’s not recommended to use plotly for large data analytics.
Code
fig = go.Figure()for ticker, data in processed_data.items(): fig.add_trace( go.Scatter( x=data.index, y=data['Close'], mode='lines', name=f"{ticker} Close" ) )fig.update_layout( title="European Semiconductor Companies - Close Prices", xaxis_title="Time", yaxis_title="Close Price (€ or $ depending on listing)", legend_title="Company")fig.show()
Figure 1: Time series line plot
Also line chart plot of last 500 days.
Code
fig = go.Figure()for ticker, data in processed_data.items(): data = data.tail(500)print(data['Close'].min(), data['Close'].max()) fig.add_trace( go.Scatter( x=data.index, y=data['Close'], mode='lines', name=f"{ticker} Close" ) )fig.update_layout( title="European Semiconductor Companies - Close Prices of last 500 days", xaxis_title="Time", yaxis_title="Close Price (€ or $ depending on listing)", legend_title="Company")fig.show()
Next is the analysis of MACD. MACD (Moving Average Convergence Divergence) is a commonly used test in financial statistics and trading. It reveals general trends in the stocks for buying and selling. It’s a really important step in stock market analysis. It’s recommended to zoom in the plot to see the MACD results and candlestick plot better.
ASML.AS: No Crossover → Bearish Trend
NXPI: No Crossover → Bullish Trend
IFX.DE: Cross Above Signal Line → Potential Bullish Signal
BESI.AS: No Crossover → Bearish Trend
NOD.OL: No Crossover → Bullish Trend
MELE.BR: No Crossover → Bearish Trend
AIXA.DE: No Crossover → Bullish Trend
SMHN.DE: No Crossover → Bearish Trend
AWEVF: No Crossover → Bullish Trend
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 3
RSI analysis
The next technical indicator analysis is RSI (Relative strength index). The indicator helps to indentify the oveerbought and oversold trends and buy and sell signals. Using both RSI and MACD is the most optimal way to figure out stock market trends for trading.
The Q-Q-plot is an important sanity check for the market data. From the Q-Q-plot you can tell if the data aligns with a standard probablity distribution. Straight line means aligning with the distribution.
Code
import statsmodels.api as smimport matplotlib.pyplot as pltfrom arch import arch_modelimport datetime as dtfrom scipy.stats import normfor ticker, data in all_data.items(): returns =100* data['Close'].pct_change().dropna()# Sort the sample sorted_returns = np.sort(returns) n =len(sorted_returns)# Compute theoretical quantiles from standard normal p = (np.arange(1, n+1) -0.5) / n theoretical_quantiles = norm.ppf(p)# Reference line (45-degree line) ref_line = [theoretical_quantiles.min(), theoretical_quantiles.max()]# Plot with Plotly qq_fig = go.Figure()# Scatter points qq_fig.add_trace(go.Scatter( x=theoretical_quantiles, y=sorted_returns, mode='markers', name='Data' ))# 45-degree reference line qq_fig.add_trace(go.Scatter( x=ref_line, y=ref_line, mode='lines', line=dict(color='red', dash='dash'), name='Fit Line' )) qq_fig.update_layout( title=f'{ticker} Returns Q-Q Plot', xaxis_title='Theoretical Quantiles', yaxis_title='Sample Quantiles' ) qq_fig.show()
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 5
GARCH model
The GARCH (Generalized Autoregressive Conditional Heteroscedasticity) model is a popular statistical model for time series analysis, especially in trading and quantitative finance. The main application of ARCH in finance is to examine and forecast the market volatility. This is especially important for volatile and risk-averse markets like semiconductor market.
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
8.343022
8.432228
8.521434
8.610640
8.699846
2026-02-11
8.034235
8.123442
8.212648
8.301854
8.391060
2026-02-12
7.965153
8.054359
8.143565
8.232772
8.321978
2026-02-13
7.635271
7.724478
7.813684
7.902890
7.992096
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-8841.72
Distribution:
Normal
AIC:
17691.4
Method:
Maximum Likelihood
BIC:
17716.5
No. Observations:
3901
Date:
Mon, Feb 16 2026
Df Residuals:
3900
Time:
15:21:10
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.0996
2.612e-02
3.813
1.373e-04
[4.839e-02, 0.151]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
8.8983e-03
6.569e-03
1.355
0.176
[-3.977e-03,2.177e-02]
alpha[1]
0.0722
1.229e-02
5.881
4.083e-09
[4.817e-02,9.633e-02]
beta[1]
0.9278
1.428e-02
64.971
0.000
[ 0.900, 0.956]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-243492.
Distribution:
Normal
AIC:
486991.
Method:
User-specified Parameters
BIC:
487016.
No. Observations:
3904
Date:
Mon, Feb 16 2026
Time:
15:21:10
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
6.911105
6.920091
6.929077
6.938063
6.947048
2026-02-11
8.566262
8.575268
8.584275
8.593281
8.602288
2026-02-12
8.662645
8.671653
8.680661
8.689668
8.698677
2026-02-13
8.095002
8.104003
8.113004
8.122004
8.131005
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-15452.4
Distribution:
Normal
AIC:
30912.8
Method:
Maximum Likelihood
BIC:
30940.0
No. Observations:
6621
Date:
Mon, Feb 16 2026
Df Residuals:
6620
Time:
15:21:10
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.0881
2.783e-02
3.168
1.537e-03
[3.360e-02, 0.143]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0664
2.557e-02
2.595
9.458e-03
[1.624e-02, 0.116]
alpha[1]
0.0540
1.119e-02
4.826
1.396e-06
[3.208e-02,7.596e-02]
beta[1]
0.9370
1.325e-02
70.708
0.000
[ 0.911, 0.963]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-502675.
Distribution:
Normal
AIC:
1.00536e+06
Method:
User-specified Parameters
BIC:
1.00539e+06
No. Observations:
6624
Date:
Mon, Feb 16 2026
Time:
15:21:10
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
4.625270
4.650100
4.674707
4.699094
4.723261
2026-02-11
4.508649
4.534527
4.560172
4.585586
4.610773
2026-02-12
4.472842
4.499041
4.525004
4.550735
4.576234
2026-02-13
4.411641
4.438389
4.464897
4.491167
4.517201
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-16940.8
Distribution:
Normal
AIC:
33889.6
Method:
Maximum Likelihood
BIC:
33917.1
No. Observations:
7093
Date:
Mon, Feb 16 2026
Df Residuals:
7092
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1554
2.890e-02
5.377
7.583e-08
[9.874e-02, 0.212]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0700
2.654e-02
2.640
8.300e-03
[1.804e-02, 0.122]
alpha[1]
0.0483
9.928e-03
4.865
1.146e-06
[2.884e-02,6.775e-02]
beta[1]
0.9446
1.167e-02
80.912
0.000
[ 0.922, 0.967]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-578098.
Distribution:
Normal
AIC:
1.15620e+06
Method:
User-specified Parameters
BIC:
1.15623e+06
No. Observations:
7096
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
8.258730
8.270010
8.281211
8.292331
8.303373
2026-02-11
7.913730
7.927465
7.941103
7.954643
7.968088
2026-02-12
7.645786
7.661428
7.676958
7.692378
7.707689
2026-02-13
8.181165
8.192997
8.204746
8.216410
8.227992
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-17462.0
Distribution:
Normal
AIC:
34931.9
Method:
Maximum Likelihood
BIC:
34959.1
No. Observations:
6620
Date:
Mon, Feb 16 2026
Df Residuals:
6619
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1177
4.077e-02
2.887
3.892e-03
[3.779e-02, 0.198]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
1.1929
0.892
1.337
0.181
[ -0.556, 2.941]
alpha[1]
0.0587
3.394e-02
1.730
8.364e-02
[-7.806e-03, 0.125]
beta[1]
0.8456
9.160e-02
9.231
2.677e-20
[ 0.666, 1.025]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-2.62349e+06
Distribution:
Normal
AIC:
5.24699e+06
Method:
User-specified Parameters
BIC:
5.24702e+06
No. Observations:
6623
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
16.765399
16.353629
15.981271
15.644552
15.340061
2026-02-11
15.477566
15.189058
14.928163
14.692240
14.478897
2026-02-12
14.478288
14.285423
14.111017
13.953305
13.810687
2026-02-13
14.618437
14.412158
14.225622
14.056940
13.904403
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-12943.7
Distribution:
Normal
AIC:
25895.3
Method:
Maximum Likelihood
BIC:
25922.2
No. Observations:
6091
Date:
Mon, Feb 16 2026
Df Residuals:
6090
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1293
2.716e-02
4.761
1.931e-06
[7.607e-02, 0.183]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.1547
8.297e-02
1.864
6.234e-02
[-7.971e-03, 0.317]
alpha[1]
0.0836
2.840e-02
2.942
3.262e-03
[2.789e-02, 0.139]
beta[1]
0.8868
4.154e-02
21.347
4.167e-101
[ 0.805, 0.968]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-293219.
Distribution:
Normal
AIC:
586446.
Method:
User-specified Parameters
BIC:
586473.
No. Observations:
6094
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
11.456505
11.271090
11.091178
10.916607
10.747218
2026-02-11
10.554557
10.395914
10.241981
10.092617
9.947687
2026-02-12
9.737062
9.602686
9.472299
9.345782
9.223020
2026-02-13
8.818160
8.711060
8.607140
8.506304
8.408461
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-18540.8
Distribution:
Normal
AIC:
37089.6
Method:
Maximum Likelihood
BIC:
37117.0
No. Observations:
6971
Date:
Mon, Feb 16 2026
Df Residuals:
6970
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.0638
4.071e-02
1.567
0.117
[-1.598e-02, 0.144]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0851
6.370e-02
1.336
0.182
[-3.974e-02, 0.210]
alpha[1]
0.0276
1.085e-02
2.547
1.087e-02
[6.372e-03,4.892e-02]
beta[1]
0.9669
1.469e-02
65.836
0.000
[ 0.938, 0.996]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-730737.
Distribution:
Normal
AIC:
1.46148e+06
Method:
User-specified Parameters
BIC:
1.46151e+06
No. Observations:
6974
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
15.372898
15.374674
15.376440
15.378196
15.379943
2026-02-11
15.007516
15.011272
15.015008
15.018723
15.022419
2026-02-12
14.901144
14.905477
14.909787
14.914073
14.918335
2026-02-13
15.621601
15.622028
15.622453
15.622876
15.623296
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-18235.1
Distribution:
Normal
AIC:
36478.1
Method:
Maximum Likelihood
BIC:
36505.5
No. Observations:
6834
Date:
Mon, Feb 16 2026
Df Residuals:
6833
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1177
4.038e-02
2.913
3.575e-03
[3.850e-02, 0.197]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.1294
9.690e-02
1.335
0.182
[-6.054e-02, 0.319]
alpha[1]
0.0407
1.935e-02
2.105
3.533e-02
[2.798e-03,7.865e-02]
beta[1]
0.9509
2.491e-02
38.167
0.000
[ 0.902, 1.000]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-683725.
Distribution:
Normal
AIC:
1.36746e+06
Method:
User-specified Parameters
BIC:
1.36749e+06
No. Observations:
6837
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
15.527731
15.526324
15.524929
15.523545
15.522174
2026-02-11
14.966846
14.970164
14.973453
14.976714
14.979949
2026-02-12
15.737398
15.734225
15.731079
15.727959
15.724866
2026-02-13
15.601639
15.599609
15.597597
15.595602
15.593623
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-1144.35
Distribution:
Normal
AIC:
2296.69
Method:
Maximum Likelihood
BIC:
2312.37
No. Observations:
372
Date:
Mon, Feb 16 2026
Df Residuals:
371
Time:
15:21:11
Df Model:
1
Mean Model
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.4701
4.393
0.107
0.915
[ -8.141, 9.081]
Volatility Model
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
3.6103
203.967
1.770e-02
0.986
[-3.962e+02,4.034e+02]
alpha[1]
0.3482
7.219
4.823e-02
0.962
[-13.801, 14.497]
beta[1]
0.6518
4.283
0.152
0.879
[ -7.742, 9.046]
Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-125551.
Distribution:
Normal
AIC:
251110.
Method:
User-specified Parameters
BIC:
251126.
No. Observations:
375
Date:
Mon, Feb 16 2026
Time:
15:21:11
Mean Model
coef
mu
0.0235
Volatility Model
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
10.592766
14.203046
17.813326
21.423607
25.033887
2026-02-11
10.591889
14.202169
17.812450
21.422730
25.033010
2026-02-12
10.591318
14.201598
17.811878
21.422159
25.032439
2026-02-13
10.590945
14.201226
17.811506
21.421786
25.032066
Statistical checks
The next analysis checks the skewness and mode of the stock market data among other statistical measures. These are important for detailed understanding of the stock markets. Some analysts have debated that positive skewness is a good indicator for buying. The other statistical measures such as mean and standard deviation are also important for stock market analysis and can help with buy/sell decisions.
Code
from scipy.stats import skew, mode, gaussian_kdefor ticker, data in all_data.items(): close_values = data['Close']# Stats close_mean = np.mean(close_values) close_median = np.median(close_values) mode_val = close_values.mode().iloc[0]# Use Seaborn's KDE to get values kde = sns.kdeplot(close_values) # create a temporary plot kde_data = kde.get_lines()[0].get_data() # extract x, y values x_range, y_values = kde_data kde.figure.clf() # clear the temporary Seaborn figure# Create Plotly figure fig = go.Figure()# KDE line fig.add_trace(go.Scatter( x=x_range, y=y_values, mode='lines', name='KDE', line=dict(color='blue') ))# Vertical lines fig.add_trace(go.Scatter( x=[close_mean, close_mean], y=[0, max(y_values)], mode='lines', line=dict(color='orange', dash='dash'), name='Mean' )) fig.add_trace(go.Scatter( x=[close_median, close_median], y=[0, max(y_values)], mode='lines', line=dict(color='black', dash='dash'), name='Median' )) fig.add_trace(go.Scatter( x=[mode_val, mode_val], y=[0, max(y_values)], mode='lines', line=dict(color='green', dash='dash'), name='Mode' ))# Layout fig.update_layout( title=f"Distribution of {ticker} Close Prices (Skewness)", xaxis_title="Price", yaxis_title="Density", width=800, height=500, template='plotly_white', legend=dict(title="Statistics") ) fig.show()
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
<Figure size 672x480 with 0 Axes>
(j)
Figure 6
XGBoost
XGBoost is the first of the ML models used in this project. XGBoost is one of the most popular gradient boosting implementations and fits expectionally well when analyzing time series data. XGBoost is a quite complicated model, so it’s easier to understand the results rather than the model itself. The time series line plot for these models includes only 200 days of historical data for easier visualization.
Code
import xgboost as xgbfrom sklearn.metrics import mean_squared_errorcolors = px.colors.qualitative.Alphabet#First it's important to go through the data and separate each feature for trainingdef create_features(df, label=None): df = df.copy() df['date'] = df.index df['date'] = pd.to_datetime(df['date']) df['hour'] = df['date'].dt.hour df['dayofweek'] = df['date'].dt.dayofweek df['quarter'] = df['date'].dt.quarter df['month'] = df['date'].dt.month df['year'] = df['date'].dt.year df['dayofyear'] = df['date'].dt.dayofyear df['dayofmonth'] = df['date'].dt.day df['weekofyear'] = df['date'].dt.isocalendar().week X = df[['hour','dayofweek','quarter','month','year','dayofyear','dayofmonth','weekofyear']]if label: y = df[label]return X, yreturn Xfig = go.Figure()for i, (ticker, data) inenumerate(processed_data.items()): current_color = colors[i %len(colors)] data = data.sort_index() split_date ='10-Feb-2026' stock_train = data.loc[data.index <= split_date].copy() stock_test = data.loc[data.index > split_date].copy() X_train, y_train = create_features(stock_train, label='Close') X_test, y_test = create_features(stock_test, label='Close') reg = xgb.XGBRegressor(n_estimators=1000, early_stopping_rounds=50) reg.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], verbose=False) forecast_periods =50 data_recent = data.tail(500).copy() data_recent.index = pd.to_datetime(data_recent.index) data_recent = data_recent.sort_index() hist_x = data_recent.index future_start = hist_x[-1] + pd.Timedelta(days=1) future_dates = pd.date_range(start=future_start, periods=forecast_periods, freq='B') future_df = pd.DataFrame(index=future_dates) X_future = create_features(future_df) forecast = reg.predict(X_future) last_hist_date = data_recent.index[-1] last_hist_close = data_recent['Close'].iloc[-1] plot_forecast_dates = pd.Index([last_hist_date]).append(future_dates) plot_forecast_values = np.concatenate(([last_hist_close], forecast)) fig.add_trace(go.Scatter( x=data_recent.index, y=data_recent['Close'], mode='lines', name=f'Historical Market Close of {ticker}', line=dict(color=current_color) )) fig.add_trace(go.Scatter( x=plot_forecast_dates, y=plot_forecast_values, mode='lines', name=f'Predicted Future Close of {ticker}', line=dict(color=current_color, dash='dash') ))fig.update_layout( title=f'Stock Close Price vs XGBoost Prediction', xaxis_title='Date', yaxis_title='Price', template='plotly_white')fig.show()
Figure 7
Results
This section is an overview of the results from the previous analysis and forecasts. Because the markets are extremely volatile and many of the stocks, mostly notably ASML, have been skyrocketing in value lately, making forecasts is difficult, as some of the stocks have already been expected to fall according to most recent data.
From the first line chart Figure 1 and the market trend analysis you can see which companies have the strongest trends. ASML has been performing expectionally, but their stock value is experiencing a significant decrease. The other companies have similarly volatile stocks.
From the MACD Figure 3 and RSI Figure 4 indicator analysis it’s easy to see that the markets are very volatile. THe RSI plots vary heavily between oversold and overbought for most of the companies. This makes stock market analysis especially difficult and markets heavily exposed to speculation. One of the most ‘stable’ markets is that of AWEVF, partly due to it being a new company.
From the GARCH model Figure 5 and related statistical analysis, you can determine the most importand trends and qualities when it comes to market volatility. In this case, the most important plots to look at are the ‘estimated vs fixed volatility’ and forecast plots (remember to scroll to left to see the forecast). From these plots you can make decisions for risk management, banking regulations, and derivative management.
The statistical checks are somewhat optional but measures such as skewness from Figure 6 can provide significant details about the stock markets.
Finally the project the time series forecast model, XGboost Figure 7. As you can see from the graphs, the xgboost gives quite realistic forecast on the market close values. However for some of the tickers, you can see how the XGboost model might be too primitive and produce unrealistic forecasts.
The goal of this project has been to provide a wide variety of tools and models for stock market analysis and forecasting that can be applied to trading, investing, portfolio management, etc. The models should not be treated as firmly accurate, but instead as experimental models.